fix(update): restart systemd-managed daemons via their supervisor (#82)#89
Conversation
`update install` stopped every running daemon and re-spawned it detached, escaping any external supervisor. On a systemd host the detached daemon competed with systemd for the PKC RPC port, so systemd's own unit could never bind and crash-looped on "PKC RPC port is already in use". The daemon now records its supervisor at startup (systemd, via $INVOCATION_ID + /proc/self/cgroup) in its state file. `update install` partitions running daemons: unsupervised ones keep the SIGINT + detached-respawn path, while supervised ones are left running across the binary swap and restarted with `systemctl restart <unit>`. Legacy daemons without the field fall back to inferring the unit from /proc/<pid>/cgroup; non-Linux falls back to current behavior. Tests: cgroup parser, supervisor detection/resolution, restart routing (pure, injectable lifecycle), the systemctl command, and an E2E proving a supervised daemon survives `update install` and is never detached-spawned (red against the old code, green after the fix). Refs #82
|
Warning Review limit reached
More reviews will be available in 9 minutes and 27 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…stemd-managed (issue #92) $INVOCATION_ID and a .service cgroup leaf are inherited by every descendant of any systemd service — e.g. all processes inside a GitHub Actions runner (hosted-compute-agent.service) — so unit membership alone misidentified daemons as supervised and update install skipped restarting them (failing master CI on ubuntu since PR #89). detectSelfSupervisor and the legacy cgroup fallback in resolveDaemonSupervisor now also require the unit's MainPID (via systemctl show) to be the daemon itself or its direct parent (covering ExecStart shell wrappers). When MainPID can't be read the daemon is treated as unsupervised, restoring the pre-#82 direct-restart path.
Why
The systemd-aware
update installfix from #82/#83 was merged intochore/upgrade-kubo-0.42.0rather thanmaster, so it never reached a release. Published 0.19.70 (= master) does not contain it. On a systemd-supervised host,bitsocial update installtherefore still:Restart=on-failure) respawns it, andso two daemons race for the RPC port; the updater's loses, dies after printing its flags, and leaves an orphan log that wedges
bitsocial logs -f. Observed live on thenew-plebbitprod host today.What
Cherry-pick of the isolated fix commit (d995ebb) onto latest master — no version regression, no unrelated churn from the stale kubo branch. Makes
update installsupervisor-aware: a daemon running under systemd is restarted viasystemctl restart <unit>instead of being killed and re-spawned by the updater (which competes with the supervisor for the RPC port).Includes:
src/update/restart-orchestration.ts— partition daemons into supervised vs. updater-managed; route restarts accordingly.src/update/systemctl.ts— thinsystemctl restartwrapper.src/common-utils/daemon-state.ts— record/resolve a daemon's supervisor (systemd unit via cgroup); record it at startup, fall back to live-cgroup inference for legacy daemons.src/cli/commands/update/install.ts— wait for full process exit (not just RPC port free) before the swap, and restart supervised daemons through their supervisor.Verification
npm run build && npm run build:testpass against latest master.Supersedes the stale
chore/upgrade-kubo-0.42.0carrying #83 (that branch is behind master and would regress the version).